This document aims to povide a shirt introduction into statistical analysis with R. The assumption is, that the reader is already familiar with basic concepts in R and the Rstuido IDE, and has some basic knowledge about statistics, for example from using SPSS. The goal here is to show how one can
The name R refers at the same time to 3 things:
R packages are written (typically by using R, the programming language) by researchers and available for free. These packages are crucial, because the capabilities of The basic R software are limited.
There are thousands of R packages. Therefore, CRAN task view web sites are usful to give an overview about R packages for different application areas. CRAN stands for “Comprehensive R Archive Network”.
For this introduction we will use following packes:
haven allows to read SPSS (or stata) data into Rggplot facilitates plottingpsych is a package for factor analysis and related thingssjPlot makes plotting of regression analysis results easyInstalling packages is easy!
install.packages("haven")
install.packages("ggplot2")
install.packages("psych")
install.packages("sjplot")
install.packages("VIM")
One can also install multiple packages simulataneously as follows install.packages(c("haven","psych")).
The problem we are looking at is the association between Parenting Style and child mental health symptoms. More specifically, we look at the association between self reported Parenting Style of well educated Norwegian mothers at child age 5 years and opposional behavior as well as mood and feeling at age 8. This data can be obtained from moba questionnaires 5 and 8.
If these data are available in SPSS (or stata) format, we can read them with the haven package, which we wirst have to load befor we can use it:
library(haven)
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## Loading required package: data.table
## VIM is ready to use.
## Since version 4.0.0 the GUI is in its own package VIMGUI.
##
## Please use the package to use the new (and old) GUI.
## Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
##
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
##
## sleep
All packages comn with a documentation, and if one is lucky also with one or more vignettes, which are tutorial-style documents that show how the functions in the package can be used. This and more information about the packages can be found at https://cran.rstudio.com/web/packages/haven/. For other packages, just replace “haven” with the name of the package you are interested in.
OK, now we can load the data:
data_directory = "data/"
Q5aar = read_sav(paste0(data_directory,
"PDB439_Skjema5aar_v10.sav"))
Q5aar = data.frame(Q5aar)
So what have we done here? We first created a new variable data_directory and set it to the path to the directory where the data files are. Then, when loading the data into the data.frame Q5aar, we collated the path to the directory with the name of the data file by using the paste0command.
The reason for doing this is that the code becomes a bit easier to read (still not great ). For the same reason, I made a line break at the end of Q5aar = read_sav(paste0(data_directory,.
Note that if we would already be in the same directory as the data file, we could just have written Q5aar = read_sav("PDB439_Skjema5aar_v10.sav"). You can use the commands getwd() and setwd() to check and set you working directory (wd).
Next we select the variables we need. This are the pregancy ID and the child number, as well as the items for measuring parental style (we get SPSS variable names from here).
Q5aar = Q5aar[,c("PREG_ID_439","BARN_NR",
paste0("LL",526:534))]
names(Q5aar)
## [1] "PREG_ID_439" "BARN_NR" "LL526" "LL527" "LL528"
## [6] "LL529" "LL530" "LL531" "LL532" "LL533"
## [11] "LL534"
Ok, these variable names are not very clear, lets rename them:
names(Q5aar)[3:11] = paste0("PS.item.",1:9)
names(Q5aar)
## [1] "PREG_ID_439" "BARN_NR" "PS.item.1" "PS.item.2" "PS.item.3"
## [6] "PS.item.4" "PS.item.5" "PS.item.6" "PS.item.7" "PS.item.8"
## [11] "PS.item.9"
It is important to understan the coding of the variables. The haven package makes SPSS’ Variable label and Value label information available as attributes of the variables in the data.frame. We can access those as follows:
attr(Q5aar$PS.item.1,"label")
## [1] "W_53_1:SKJEMA_5AARB; Du lar barnet ditt forstå når han/hun gjør en god innsats; 53. Hvor ofte hender dette hjemme hos dere?"
attr(Q5aar$PS.item.1,"labels")
## Mer enn ett kryss Aldri Nesten aldri Noen ganger
## 0 1 2 3
## Ofte Alltid
## 4 5
Here we see one anoying detail of SPSS Moba data: If the mother has ticked more than one box, the data are for our purposes missing, and not zero. We need to fix that.
To do this, we do a quick for loop:
item_levels = attr(Q5aar$PS.item.1,"lables")[-1]
item_names = names(Q5aar)[-c(1,2)]
item_quest = c()
for (v in item_names) {
item_quest = c(item_quest,
strsplit(attr(Q5aar[,v],"label"),
split = ";")[[1]][2])
attr(Q5aar[, v],"labels") = attr(Q5aar[, v],"labels")[-1]
# need to use the "which", otherwise R stumbles over missing values
value_is_0 = which(Q5aar[,v] == 0)
Q5aar[value_is_0, v] = NA
}
table(as_factor(Q5aar$PS.item.1))
##
## Aldri Nesten aldri Noen ganger Ofte Alltid
## 9 7 359 13795 12110
Missing data is always a problem, especially with questionnaire data from long lasting studies such as MoBa. Lets look at missing data by schowing a table that counts them:
table_plus_missing = table(addNA(as_factor(Q5aar$PS.item.1)))
barplot(table_plus_missing)
table_plus_missing
##
## Aldri Nesten aldri Noen ganger Ofte Alltid
## 9 7 359 13795 12110
## <NA>
## 15332
There is lots of missing data, probably because MoBa changed the Parenting Style items throughout.
Lets look a little closer at the missing data:
is_missing = is.na(Q5aar[,-c(1,2)])
hist(rowSums(is_missing))
Indeed there is a substantial number of cases for which als Parenting style itmes are missing. Let’s remove those participants. In fact, to facilitate any further analysis, we’ll only use the complete cases.
Q5aar = Q5aar[complete.cases(Q5aar),]
Our goal is to reduce the 9 items to a smaller set of variables describing parenting style. One way to do this is to use exploratory factor analysis (EFA) to learn the factor structure of the data. As a first step we’ll check how many factors we need, using the fa.parallel.poly command from the psych package.
As with many things in R, using up to date methods is not hard, as you’ll see. The problem is more in knowing that these methods exist. If you do not know which method to use, try googling it. The challange is in finding out which of the many similar packages one should use. Here are a few tipps:
Now lets move on with the exploratory factor analysis. One thing we need to specify is that polychoric correlations are used, because we have ordinal data.
library(psych)
describe(Q5aar[,-c(1,2)])
## vars n mean sd median trimmed mad min max range skew
## PS.item.1 1 25059 4.45 0.53 4 4.45 0.00 1 5 4 -0.19
## PS.item.2 2 25059 2.14 0.77 2 2.14 1.48 1 5 4 0.17
## PS.item.3 3 25059 4.19 0.45 4 4.14 0.00 1 5 4 0.64
## PS.item.4 4 25059 2.04 0.83 2 2.01 1.48 1 5 4 0.34
## PS.item.5 5 25059 4.77 0.45 5 4.86 0.00 1 5 4 -1.94
## PS.item.6 6 25059 4.65 0.49 5 4.70 0.00 1 5 4 -0.80
## PS.item.7 7 25059 4.47 0.57 5 4.51 0.00 1 5 4 -0.57
## PS.item.8 8 25059 4.22 0.59 4 4.26 0.00 1 5 4 -0.25
## PS.item.9 9 25059 2.33 0.78 2 2.36 1.48 1 5 4 -0.02
## kurtosis se
## PS.item.1 -0.67 0.00
## PS.item.2 -0.37 0.00
## PS.item.3 1.18 0.00
## PS.item.4 -0.45 0.01
## PS.item.5 4.63 0.00
## PS.item.6 -0.81 0.00
## PS.item.7 -0.29 0.00
## PS.item.8 0.33 0.00
## PS.item.9 -0.32 0.00
fa.parallel(Q5aar[,-c(1,2)], cor = "poly")
## Warning in matpLower(x, nvar, gminx, gmaxx, gminy, gmaxy): 33 cells were
## adjusted for 0 values using the correction for continuity. Examine your
## data carefully.
## Parallel analysis suggests that the number of factors = 4 and the number of components = 2
4 factors. I had hoped fewer, but lets continue and do an exploratory factor analysis.
fa_ps = fa(Q5aar[,-c(1,2)],
nfactors = 4,
cor = "poly")
## Warning in matpLower(x, nvar, gminx, gmaxx, gminy, gmaxy): 33 cells were
## adjusted for 0 values using the correction for continuity. Examine your
## data carefully.
## Loading required namespace: GPArotation
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate =
## rotate, : I am sorry, to do these rotations requires the GPArotation
## package to be installed
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate =
## rotate, : An ultra-Heywood case was detected. Examine the results carefully
fa_ps
## Factor Analysis using method = minres
## Call: fa(r = Q5aar[, -c(1, 2)], nfactors = 4, cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 MR3 MR4 h2 u2 com
## PS.item.1 0.77 0.00 0.39 0.36 0.88 0.1228 1.9
## PS.item.2 -0.26 0.68 0.04 -0.02 0.54 0.4633 1.3
## PS.item.3 0.61 -0.04 -0.06 0.08 0.38 0.6156 1.1
## PS.item.4 -0.20 0.57 0.07 0.05 0.37 0.6309 1.3
## PS.item.5 0.63 0.06 -0.10 -0.18 0.45 0.5529 1.2
## PS.item.6 0.92 0.12 0.22 -0.31 1.01 -0.0061 1.4
## PS.item.7 0.80 0.15 0.08 -0.05 0.67 0.3345 1.1
## PS.item.8 0.79 0.20 -0.54 0.16 0.99 0.0126 2.0
## PS.item.9 -0.16 0.69 0.02 0.02 0.50 0.5038 1.1
##
## MR1 MR2 MR3 MR4
## SS loadings 3.62 1.34 0.52 0.30
## Proportion Var 0.40 0.15 0.06 0.03
## Cumulative Var 0.40 0.55 0.61 0.64
## Proportion Explained 0.63 0.23 0.09 0.05
## Cumulative Proportion 0.63 0.86 0.95 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 36 and the objective function was 3.93 with Chi Square of 98544.33
## The degrees of freedom for the model are 6 and the objective function was 0.02
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.02
##
## The harmonic number of observations is 25059 with the empirical chi square 161.52 with prob < 2.8e-32
## The total number of observations was 25059 with Likelihood Chi Square = 618.69 with prob < 2.2e-130
##
## Tucker Lewis Index of factoring reliability = 0.963
## RMSEA index = 0.064 and the 90 % confidence intervals are 0.06 0.068
## BIC = 557.91
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR2 MR3 MR4
## Correlation of (regression) scores with factors 1.00 0.86 0.96 0.92
## Multiple R square of scores with factors 0.99 0.74 0.92 0.84
## Minimum correlation of possible factor scores 0.98 0.48 0.83 0.68
We can at first look at the RMSEA. The value is 0.06, which is not great but OK.
This are the factor loadings:
factor_loadings = loadings(fa_ps)
par(mar = c(5,3,1,1))
xs = barplot(t(factor_loadings),
beside = T,
xaxt = "n",
col = c("green2","yellow","blue","red"))
axis(1,
at = colMeans(xs),
labels = paste0("Item\n",1:9), lty = 0)
This is not very clear, because a number of items (especially item 3) have high loadings for multiple factors. Moreover, items 1 and 8 stand apart. Lets try to redo this without these items.
fa.parallel(Q5aar[,-c(1,2,3,10)], cor = "poly")
## Warning in matpLower(x, nvar, gminx, gmaxx, gminy, gmaxy): 18 cells were
## adjusted for 0 values using the correction for continuity. Examine your
## data carefully.
## Parallel analysis suggests that the number of factors = 3 and the number of components = 2
OK, now we only need 3 or maybe only 2 factors. Lets first try with 2.
fa_ps = fa(Q5aar[,-c(1,2,3,10)],
nfactors = 2,
cor = "poly")
## Warning in matpLower(x, nvar, gminx, gmaxx, gminy, gmaxy): 18 cells were
## adjusted for 0 values using the correction for continuity. Examine your
## data carefully.
## Loading required namespace: GPArotation
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate =
## rotate, : I am sorry, to do these rotations requires the GPArotation
## package to be installed
fa_ps
## Factor Analysis using method = minres
## Call: fa(r = Q5aar[, -c(1, 2, 3, 10)], nfactors = 2, cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## PS.item.2 -0.34 0.65 0.54 0.462 1.5
## PS.item.3 0.58 0.02 0.33 0.667 1.0
## PS.item.4 -0.27 0.54 0.36 0.635 1.5
## PS.item.5 0.63 0.13 0.42 0.583 1.1
## PS.item.6 0.94 0.26 0.94 0.058 1.2
## PS.item.7 0.77 0.24 0.66 0.344 1.2
## PS.item.9 -0.24 0.66 0.49 0.506 1.3
##
## MR1 MR2
## SS loadings 2.45 1.29
## Proportion Var 0.35 0.18
## Cumulative Var 0.35 0.53
## Proportion Explained 0.65 0.35
## Cumulative Proportion 0.65 1.00
##
## Mean item complexity = 1.2
## Test of the hypothesis that 2 factors are sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 2.51 with Chi Square of 62866.52
## The degrees of freedom for the model are 8 and the objective function was 0.03
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 25059 with the empirical chi square 268.66 with prob < 1.9e-53
## The total number of observations was 25059 with Likelihood Chi Square = 766.98 with prob < 2.7e-160
##
## Tucker Lewis Index of factoring reliability = 0.968
## RMSEA index = 0.062 and the 90 % confidence intervals are 0.058 0.065
## BIC = 685.95
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.97 0.86
## Multiple R square of scores with factors 0.94 0.74
## Minimum correlation of possible factor scores 0.88 0.48
RMSEA and even more the TLI suggest that 2 factors hould be OK.
This are the new factor loadings:
factor_loadings = loadings(fa_ps)
par(mar = c(5,3,1,1))
xs = barplot(t(factor_loadings),
beside = T,
xaxt = "n",
col = c("green2","yellow"))
axis(1,
at = colMeans(xs),
labels = paste0("Item\n",(1:9)[-c(1,8)]), lty = 0)
This looks like a clear factor structure. Lets look at the items to understand what those factor might measure.
simple_loadings = apply(factor_loadings,
1,
function(x)
(abs(x) == max(abs(x))))
item_quest_reduced = item_quest[-c(1,8)]
items_factor1 = which(simple_loadings[1,] == T)
items_factor2 = which(simple_loadings[2,] == T)
cat(paste0("Factor1:\n",
paste(item_quest_reduced[items_factor1],
collapse = "\n"),
"\n"))
## Factor1:
## Du har en hyggelig samtale med barnet ditt
## Du spør barnet ditt om hvordan hans/hennes dag i barnehagen har vært
## Du sier noe pent til barnet ditt når han/hun har gjort noe bra
## Du roser barnet ditt om han/hun oppfører seg pent
cat(paste0("Factor2:\n",
paste(item_quest_reduced[items_factor1],
collapse = "\n"),
"\n"))
## Factor2:
## Du har en hyggelig samtale med barnet ditt
## Du spør barnet ditt om hvordan hans/hennes dag i barnehagen har vært
## Du sier noe pent til barnet ditt når han/hun har gjort noe bra
## Du roser barnet ditt om han/hun oppfører seg pent
This looks as if there is one factor around positive reinforcement and communication with the child. Lets call this positive parenting. The other factor appears to be about the inability to follow through with threats of punishment, lets call this inconsistent parenting ;-). If we scroll back and look at correlation of the two factors, it looks as if positive parenting has a weak negative correlation with inconsistent parenting.
Now, if we want to use the parenting dimensions as predictors, we need to extract the factor scores. Factor scores are likely stored in the fa_ps object, which has the results of out exploratory factor analysis. If we want to know what this object contains, we can simply click on in in the “Environment” tab at the right hand side of the RStudio IDE.
factor_scores = fa_ps$scores
colnames(factor_scores) = c("positive","inconsistent")
Lets quickly look at the factor scores, before we add them to the data.frame with teh 5 year data.:
par(mfrow = c(1,2))
hist(factor_scores[,1],
breaks = 25,
main = "positive")
hist(factor_scores[,2],
breaks = 25,
main = "inconsistent")
Q5aar = cbind(Q5aar,factor_scores)
If we had a great parenting scale, the histograms should look normally distributed. Especially the histogram for positive parenting deviates from that. It shows that (a) there are a few extreme outliers to the left and (b) there seems to be someting like a ceiling effect. That is, a good number of mothers report to be VERY positive in their parenting. This is another instance wehre it would be great to have a social desirability scale in MoBa.
Anyhow, lets continue and put together outcome data. For this, we load the MoBa 8 years data file.
data_directory = "data/"
Q8aar = read_sav(paste0(data_directory,
"PDB439_Skjema8aar_v10.sav"))
Q8aar = Q8aar[,c("PREG_ID_439","BARN_NR",
paste("NN",c(68:80, # SMFQ
211:226, # CCC-2 sort
227:233, 374, # Språåk 20
111:118, # CD
119:136, # ADHD
137:144), # OD
sep = ""))]
Q8aar = data.frame(Q8aar)
Now we are coming to a part, which always takes more time than we would wish: cleaning the data. We wqant to calculate sum-scores, but this is compliated by two facts: MoBa data come with
0 instead of NA for data with unambiguous responsesWe will adress these problems in that order, starting with renaming some variables. To make this easier, we’ll create simple functions where this is useful.
First we ranme all variables:
rename_variables = function(df,old_names,new_names) {
names(df)[names(df) %in% old_names] = new_names
return(df)
}
Q8aar = rename_variables(Q8aar,
paste0("NN",68:80),
paste0("SMFQ.i",1:13,"_8y"))
Q8aar = rename_variables(Q8aar,
paste0("NN",c(227:233,374)),
paste0("SPRAAK20.i",1:8,"_8y"))
Q8aar = rename_variables(Q8aar,
paste0("NN",211:226),
paste0("CCCs.i",1:16,"_8y"))
for (item in paste0("CCCs.i",c(10:16),"_8y"))
Q8aar[,item] = abs(Q8aar[,item]-5)
Q8aar = rename_variables(Q8aar,
paste0("NN",111:118),
paste0("CD.i",1:8,"_8y"))
Q8aar = rename_variables(Q8aar,
paste0("NN",119:136),
paste0("ADHD.i",1:18,"_8y"))
Q8aar = rename_variables(Q8aar,
paste0("NN",137:144),
paste0("OD.i",1:8,"_8y"))
Q8aar = rename_variables(Q8aar,
paste0("NN",145:149),
paste0("SCARED.i",1:5,"_8y"))
Next we replace all 0 values with NA and subtract 1 from all items. We use the grep command, which makes it easy for us to find all variables that match a particular pattern. We also temporarily create a new data.frame tmp_items, in which we keep the item data.
all_items = grep("\\.i[0-9]",names(Q8aar), value = T)
tmp_items = Q8aar[,all_items]
tmp_items[tmp_items == 0] = NA
tmp_items = tmp_items-1
Now we need to make sum scores. Lets see, if there are any missing data:
colSums(is.na(tmp_items))
## SMFQ.i1_8y SMFQ.i2_8y SMFQ.i3_8y SMFQ.i4_8y SMFQ.i5_8y
## 357 224 255 254 276
## SMFQ.i6_8y SMFQ.i7_8y SMFQ.i8_8y SMFQ.i9_8y SMFQ.i10_8y
## 218 255 273 275 267
## SMFQ.i11_8y SMFQ.i12_8y SMFQ.i13_8y CCCs.i1_8y CCCs.i2_8y
## 280 303 316 319 406
## CCCs.i3_8y CCCs.i4_8y CCCs.i5_8y CCCs.i6_8y CCCs.i7_8y
## 422 361 324 364 390
## CCCs.i8_8y CCCs.i9_8y CCCs.i10_8y CCCs.i11_8y CCCs.i12_8y
## 432 400 369 333 371
## CCCs.i13_8y CCCs.i14_8y CCCs.i15_8y CCCs.i16_8y SPRAAK20.i1_8y
## 555 362 415 454 375
## SPRAAK20.i2_8y SPRAAK20.i3_8y SPRAAK20.i4_8y SPRAAK20.i5_8y SPRAAK20.i6_8y
## 340 342 343 414 408
## SPRAAK20.i7_8y SPRAAK20.i8_8y CD.i1_8y CD.i2_8y CD.i3_8y
## 396 8502 215 198 224
## CD.i4_8y CD.i5_8y CD.i6_8y CD.i7_8y CD.i8_8y
## 134 160 155 214 174
## ADHD.i1_8y ADHD.i2_8y ADHD.i3_8y ADHD.i4_8y ADHD.i5_8y
## 212 224 273 197 270
## ADHD.i6_8y ADHD.i7_8y ADHD.i8_8y ADHD.i9_8y ADHD.i10_8y
## 209 186 228 315 232
## ADHD.i11_8y ADHD.i12_8y ADHD.i13_8y ADHD.i14_8y ADHD.i15_8y
## 216 199 215 255 216
## ADHD.i16_8y ADHD.i17_8y ADHD.i18_8y OD.i1_8y OD.i2_8y
## 244 227 250 217 258
## OD.i3_8y OD.i4_8y OD.i5_8y OD.i6_8y OD.i7_8y
## 262 229 272 223 231
## OD.i8_8y
## 227
There are mssing data. We can use imputation to replace missing data. I prefer to do this only for cases, for which we have at least 50% of the data. So we remove participants with more than 50% missing data.
proportion_missing = rowMeans(is.na(tmp_items))
Q8aar = Q8aar[proportion_missing < .5,]
tmp_items = tmp_items[proportion_missing < .5,]
Now we use the function hotdeckfrom the package VIM to do a k-nearest neighbourgh imputation. (hotdeck is fast). Other approaches like nearest neigbourg imputation of multiple chained imputation are more accurate, by they are to slow when analysing bigger data-sets in a turorial.
tmp_items_imputed = hotdeck(as.matrix(tmp_items))
tmp_items_imputed = tmp_items_imputed[,1:ncol(tmp_items)]
Finally, we can calculate sum scores.
scales = c("OD","ADHD","CD","SMFQ","CCC","SPRAAK")
sum_scores = matrix(NA,
nrow = nrow(Q8aar),
ncol = length(scales))
colnames(sum_scores) = scales
par(mfrow = c(2,3))
for (s in scales) {
items = grep(s,names(tmp_items_imputed), value = T)
sum_scores[,s] = rowSums(tmp_items_imputed[,items])
hist(sum_scores[,s],
main = s,
ylab = "")
}
head(sum_scores)
## OD ADHD CD SMFQ CCC SPRAAK
## [1,] 2 5 0 1 4 0
## [2,] 1 11 0 2 2 2
## [3,] 2 7 0 0 3 0
## [4,] 0 0 0 1 1 0
## [5,] 12 16 2 5 18 15
## [6,] 6 9 0 0 5 1
This looks more or less as expected. Now we put the Q8aar data together, and merge 5 and 8 years data.
Q8aar = cbind(Q8aar, sum_scores)
my_data = merge(Q5aar[,c("PREG_ID_439","BARN_NR",colnames(factor_scores))],
Q8aar[,c("PREG_ID_439","BARN_NR",scales)],
by = c("PREG_ID_439","BARN_NR"))
head(my_data)
## PREG_ID_439 BARN_NR positive inconsistent OD ADHD CD SMFQ CCC SPRAAK
## 1 100002 1 -0.7373562 -1.0162835 3 4 0 1 9 4
## 2 100003 1 0.8283786 -0.2782919 4 3 0 1 1 0
## 3 100004 1 -1.2626826 -1.3119950 2 5 0 1 10 1
## 4 100006 1 0.8283786 -0.2782919 3 6 0 1 1 0
## 5 100007 1 1.4545942 -0.6214863 1 0 1 0 1 0
## 6 100009 1 -1.5516668 -0.2702709 3 8 1 3 11 7
my_data$positive = as.numeric(scale(my_data$positive))
my_data$inconsistent = as.numeric(scale(my_data$inconsistent))
my_data = my_data[my_data$positive > -4,]
Finally, we can do simple linear regression models:
ADHD_model = lm(ADHD ~ positive + inconsistent, my_data)
summary(ADHD_model)
##
## Call:
## lm(formula = ADHD ~ positive + inconsistent, data = my_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.958 -4.686 -1.547 2.734 43.798
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.52605 0.05232 162.96 <2e-16 ***
## positive -0.90625 0.05281 -17.16 <2e-16 ***
## inconsistent 0.83806 0.05244 15.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.899 on 17383 degrees of freedom
## Multiple R-squared: 0.02919, Adjusted R-squared: 0.02907
## F-statistic: 261.3 on 2 and 17383 DF, p-value: < 2.2e-16
This is exciting! Parenting style at age 5 predicts ADHD symptoms at age 8!
Lets plot this for AHDHD and the other outcomes!
library(margins)
par(mar=c(3,3,2,1), mgp=c(2,.5,0), tck=-.01)
pcf = function(m) {
bci = cbind(coef(m),
confint(m))
y = 1:nrow(bci)
par(mar = c(3,10,1,1))
plot(bci[,1],y,
xlim = range(bci),
pch = 16,
yaxt = "n",
ylab = "",
xlab = "coefficient value")
segments(x0 = bci[,2], x1 = bci[,3], y0 = y)
axis(2,at = y, labels = rownames(bci), las = 2)
abline(v = 0, lty = 2, col = "red")
}
par(mfrow = c(2,2))
hist(residuals(ADHD_model))
pcf(ADHD_model)
par(mar=c(3,3,2,1), mgp=c(2,.5,0), tck=-.01)
cplot(ADHD_model,
"positive",
what = "prediction",
ylim = quantile(my_data$ADHD,c(.025,.975)))
## xvals yvals upper lower
## 1 -3.93443222 12.092244 12.512471 11.672017
## 2 -3.70448310 11.883853 12.281040 11.486667
## 3 -3.47453399 11.675463 12.049705 11.301221
## 4 -3.24458487 11.467072 11.818483 11.115661
## 5 -3.01463575 11.258682 11.587400 10.929963
## 6 -2.78468664 11.050291 11.356485 10.744097
## 7 -2.55473752 10.841901 11.125779 10.558022
## 8 -2.32478841 10.633510 10.895334 10.371686
## 9 -2.09483929 10.425119 10.665224 10.185015
## 10 -1.86489017 10.216729 10.435548 9.997910
## 11 -1.63494106 10.008338 10.206445 9.810232
## 12 -1.40499194 9.799948 9.978115 9.621780
## 13 -1.17504282 9.591557 9.750850 9.432264
## 14 -0.94509371 9.383167 9.525075 9.241258
## 15 -0.71514459 9.174776 9.301404 9.048148
## 16 -0.48519548 8.966385 9.080686 8.852085
## 17 -0.25524636 8.757995 8.863955 8.652035
## 18 -0.02529724 8.549604 8.652189 8.447020
## 19 0.20465187 8.341214 8.445869 8.236559
## 20 0.43460099 8.132823 8.244693 8.020953
cplot(ADHD_model,
"inconsistent",
what = "prediction",
ylim = quantile(my_data$ADHD,c(.025,.975)))
## xvals yvals upper lower
## 1 -3.2602006 5.791451 6.141953 5.440950
## 2 -2.9241731 6.073063 6.390699 5.755427
## 3 -2.5881456 6.354675 6.639841 6.069509
## 4 -2.2521181 6.636287 6.889530 6.383043
## 5 -1.9160906 6.917899 7.140003 6.695794
## 6 -1.5800631 7.199510 7.391642 7.007379
## 7 -1.2440356 7.481122 7.645086 7.317158
## 8 -0.9080080 7.762734 7.901441 7.624027
## 9 -0.5719805 8.044346 8.162585 7.926107
## 10 -0.2359530 8.325958 8.431348 8.220567
## 11 0.1000745 8.607569 8.710620 8.504519
## 12 0.4361020 8.889181 9.001063 8.777300
## 13 0.7721295 9.170793 9.300412 9.041174
## 14 1.1081570 9.452405 9.605606 9.299204
## 15 1.4441845 9.734017 9.914365 9.553668
## 16 1.7802120 10.015629 10.225310 9.805947
## 17 2.1162395 10.297240 10.537641 10.056840
## 18 2.4522671 10.578852 10.850890 10.306814
## 19 2.7882946 10.860464 11.164770 10.556158
## 20 3.1243221 11.142076 11.479100 10.805051
This shows a linear effect of parenting style. Is it possible that there are non-linear effects? We can easyli test this by adding squared terms to the model:
ADHD_model = lm(ADHD ~ poly(positive,2) + poly(inconsistent,2), my_data)
summary(ADHD_model)
##
## Call:
## lm(formula = ADHD ~ poly(positive, 2) + poly(inconsistent, 2),
## data = my_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.628 -4.717 -1.524 2.717 43.910
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.52433 0.05231 162.961 <2e-16 ***
## poly(positive, 2)1 -118.11879 6.90829 -17.098 <2e-16 ***
## poly(positive, 2)2 -12.87801 7.04932 -1.827 0.0677 .
## poly(inconsistent, 2)1 108.07683 7.02102 15.393 <2e-16 ***
## poly(inconsistent, 2)2 14.93442 6.93705 2.153 0.0313 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.897 on 17381 degrees of freedom
## Multiple R-squared: 0.02968, Adjusted R-squared: 0.02946
## F-statistic: 132.9 on 4 and 17381 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
hist(residuals(ADHD_model))
pcf(ADHD_model)
par(mar=c(3,3,2,1), mgp=c(2,.5,0), tck=-.01)
cplot(ADHD_model,
"positive",
what = "prediction",
ylim = quantile(my_data$ADHD,c(.025,.975)))
## xvals yvals upper lower
## 1 -3.93443222 10.734629 12.148536 9.320723
## 2 -3.70448310 10.691705 11.930576 9.452834
## 3 -3.47453399 10.638193 11.713784 9.562601
## 4 -3.24458487 10.574093 11.498286 9.649899
## 5 -3.01463575 10.499405 11.284265 9.714545
## 6 -2.78468664 10.414130 11.071986 9.756273
## 7 -2.55473752 10.318266 10.861851 9.774681
## 8 -2.32478841 10.211816 10.654475 9.769156
## 9 -2.09483929 10.094777 10.450799 9.738755
## 10 -1.86489017 9.967151 10.252210 9.682091
## 11 -1.63494106 9.828937 10.060468 9.597405
## 12 -1.40499194 9.680135 9.876872 9.483398
## 13 -1.17504282 9.520745 9.700300 9.341191
## 14 -0.94509371 9.350768 9.525895 9.175642
## 15 -0.71514459 9.170203 9.346915 8.993492
## 16 -0.48519548 8.979051 9.157714 8.800388
## 17 -0.25524636 8.777310 8.954881 8.599740
## 18 -0.02529724 8.564982 8.737074 8.392891
## 19 0.20465187 8.342066 8.504864 8.179269
## 20 0.43460099 8.108563 8.261201 7.955925
cplot(ADHD_model,
"inconsistent",
what = "prediction",
ylim = quantile(my_data$ADHD,c(.025,.975)))
## xvals yvals upper lower
## 1 -3.2602006 6.771602 7.600953 5.942252
## 2 -2.9241731 6.870619 7.544847 6.196390
## 3 -2.5881456 6.988719 7.527040 6.450397
## 4 -2.2521181 7.125902 7.548452 6.703353
## 5 -1.9160906 7.282170 7.610469 6.953871
## 6 -1.5800631 7.457521 7.714850 7.200192
## 7 -1.2440356 7.651956 7.862680 7.441231
## 8 -0.9080080 7.865474 8.051738 7.679210
## 9 -0.5719805 8.098076 8.274858 7.921295
## 10 -0.2359530 8.349762 8.523122 8.176403
## 11 0.1000745 8.620532 8.790494 8.450570
## 12 0.4361020 8.910385 9.075767 8.745003
## 13 0.7721295 9.219322 9.383197 9.055447
## 14 1.1081570 9.547343 9.722328 9.372357
## 15 1.4441845 9.894447 10.103438 9.685456
## 16 1.7802120 10.260635 10.530581 9.990690
## 17 2.1162395 10.645907 11.002072 10.289742
## 18 2.4522671 11.050262 11.515119 10.585406
## 19 2.7882946 11.473702 12.067723 10.879681
## 20 3.1243221 11.916224 12.658686 11.173763
To see how useful it can be to “program” in R, lets plot the results for all outcomes. First, we consolidate all plotting into one function:
plot_regression_results = function(m,outcome) {
par(mar=c(3,3,2,1), mgp=c(2,.5,0), tck=-.01)
par(mfrow = c(2,2))
hist(residuals(m))
pcf(m)
title(outcome)
par(mar=c(3,3,2,1), mgp=c(2,.5,0), tck=-.01)
cplot(m,
"positive",
what = "prediction",
ylim = quantile(my_data[,outcome],c(.025,.975)))
title(outcome)
cplot(m,
"inconsistent",
what = "prediction",
ylim = quantile(my_data[,outcome],c(.025,.975)))
title(outcome)
}
outcomes = c("OD","CD","SMFQ","CCC","SPRAAK")
for (o in outcomes) {
reg_mod_str = paste(o,"~poly(positive,2) + poly(inconsistent,2)")
lm_fit = lm(as.formula(reg_mod_str),
data = my_data)
plot_regression_results(lm_fit,o)
}
## xvals yvals upper lower
## 1 -3.93443222 4.334609 4.962618 3.706599
## 2 -3.70448310 4.357514 4.907778 3.807249
## 3 -3.47453399 4.371769 4.849511 3.894028
## 4 -3.24458487 4.377376 4.787871 3.966880
## 5 -3.01463575 4.374333 4.722941 4.025724
## 6 -2.78468664 4.362640 4.654838 4.070443
## 7 -2.55473752 4.342299 4.583741 4.100857
## 8 -2.32478841 4.313308 4.509922 4.116693
## 9 -2.09483929 4.275668 4.433800 4.117535
## 10 -1.86489017 4.229378 4.355992 4.102764
## 11 -1.63494106 4.174439 4.277278 4.071601
## 12 -1.40499194 4.110851 4.198235 4.023467
## 13 -1.17504282 4.038614 4.118366 3.958861
## 14 -0.94509371 3.957727 4.035512 3.879941
## 15 -0.71514459 3.868191 3.946680 3.789701
## 16 -0.48519548 3.770005 3.849361 3.690649
## 17 -0.25524636 3.663171 3.742041 3.584300
## 18 -0.02529724 3.547686 3.624124 3.471249
## 19 0.20465187 3.423553 3.495862 3.351244
## 20 0.43460099 3.290770 3.358567 3.222974
## xvals yvals upper lower
## 1 -3.2602006 2.231761 2.600130 1.863391
## 2 -2.9241731 2.353169 2.652639 2.053699
## 3 -2.5881456 2.477493 2.716597 2.238389
## 4 -2.2521181 2.604732 2.792415 2.417050
## 5 -1.9160906 2.734887 2.880706 2.589068
## 6 -1.5800631 2.867957 2.982254 2.753660
## 7 -1.2440356 3.003942 3.097539 2.910346
## 8 -0.9080080 3.142843 3.225575 3.060111
## 9 -0.5719805 3.284659 3.363179 3.206139
## 10 -0.2359530 3.429390 3.506391 3.352390
## 11 0.1000745 3.577037 3.652529 3.501546
## 12 0.4361020 3.727599 3.801056 3.654143
## 13 0.7721295 3.881077 3.953865 3.808289
## 14 1.1081570 4.037470 4.115192 3.959747
## 15 1.4441845 4.196778 4.289605 4.103951
## 16 1.7802120 4.359002 4.478902 4.239101
## 17 2.1162395 4.524141 4.682337 4.365944
## 18 2.4522671 4.692195 4.898669 4.485721
## 19 2.7882946 4.863165 5.127009 4.599321
## 20 3.1243221 5.037050 5.366826 4.707273
## xvals yvals upper lower
## 1 -3.93443222 1.5092201 1.8053901 1.2130501
## 2 -3.70448310 1.4710203 1.7305257 1.2115149
## 3 -3.47453399 1.4322206 1.6575239 1.2069173
## 4 -3.24458487 1.3928209 1.5864110 1.1992308
## 5 -3.01463575 1.3528212 1.5172251 1.1884172
## 6 -2.78468664 1.3122215 1.4500222 1.1744208
## 7 -2.55473752 1.2710219 1.3848862 1.1571575
## 8 -2.32478841 1.2292222 1.3219458 1.1364987
## 9 -2.09483929 1.1868226 1.2613982 1.1122471
## 10 -1.86489017 1.1438231 1.2035343 1.0841119
## 11 -1.63494106 1.1002235 1.1487222 1.0517248
## 12 -1.40499194 1.0560240 1.0972343 1.0148137
## 13 -1.17504282 1.0112245 1.0488357 0.9736133
## 14 -0.94509371 0.9658250 1.0025087 0.9291414
## 15 -0.71514459 0.9198256 0.9568412 0.8828099
## 16 -0.48519548 0.8732262 0.9106505 0.8358018
## 17 -0.25524636 0.8260267 0.8632223 0.7888312
## 18 -0.02529724 0.7782274 0.8142752 0.7421795
## 19 0.20465187 0.7298280 0.7639291 0.6957269
## 20 0.43460099 0.6808287 0.7128016 0.6488558
## xvals yvals upper lower
## 1 -3.2602006 0.2895919 0.4633153 0.1158684
## 2 -2.9241731 0.3471973 0.4884275 0.2059672
## 3 -2.5881456 0.4029979 0.5157597 0.2902362
## 4 -2.2521181 0.4569937 0.5455048 0.3684826
## 5 -1.9160906 0.5091845 0.5779531 0.4404160
## 6 -1.5800631 0.5595705 0.6134730 0.5056680
## 7 -1.2440356 0.6081516 0.6522918 0.5640113
## 8 -0.9080080 0.6549278 0.6939443 0.6159113
## 9 -0.5719805 0.6998991 0.7369294 0.6628689
## 10 -0.2359530 0.7430656 0.7793790 0.7067521
## 11 0.1000745 0.7844271 0.8200289 0.7488253
## 12 0.4361020 0.8239838 0.8586262 0.7893415
## 13 0.7721295 0.8617356 0.8960625 0.8274088
## 14 1.1081570 0.8976826 0.9343366 0.8610285
## 15 1.4441845 0.9318246 0.9756019 0.8880474
## 16 1.7802120 0.9641618 1.0207071 0.9076165
## 17 2.1162395 0.9946941 1.0692997 0.9200885
## 18 2.4522671 1.0234215 1.1207947 0.9260484
## 19 2.7882946 1.0503441 1.1747732 0.9259150
## 20 3.1243221 1.0754617 1.2309846 0.9199389
## xvals yvals upper lower
## 1 -3.93443222 2.979423 3.471780 2.487067
## 2 -3.70448310 2.899291 3.330696 2.467886
## 3 -3.47453399 2.820986 3.195533 2.446439
## 4 -3.24458487 2.744508 3.066335 2.422682
## 5 -3.01463575 2.669857 2.943165 2.396550
## 6 -2.78468664 2.597034 2.826115 2.367952
## 7 -2.55473752 2.526037 2.715327 2.336748
## 8 -2.32478841 2.456868 2.611013 2.302723
## 9 -2.09483929 2.389526 2.513501 2.265551
## 10 -1.86489017 2.324011 2.423276 2.224747
## 11 -1.63494106 2.260324 2.340948 2.179699
## 12 -1.40499194 2.198463 2.266972 2.129954
## 13 -1.17504282 2.138430 2.200955 2.075904
## 14 -0.94509371 2.080223 2.141207 2.019240
## 15 -0.71514459 2.023844 2.085380 1.962309
## 16 -0.48519548 1.969293 2.031507 1.907078
## 17 -0.25524636 1.916568 1.978402 1.854734
## 18 -0.02529724 1.865670 1.925597 1.805744
## 19 0.20465187 1.816600 1.873290 1.759910
## 20 0.43460099 1.769357 1.822509 1.716205
## xvals yvals upper lower
## 1 -3.2602006 1.516453 1.805253 1.227653
## 2 -2.9241731 1.514604 1.749387 1.279821
## 3 -2.5881456 1.521304 1.708761 1.333848
## 4 -2.2521181 1.536555 1.683697 1.389413
## 5 -1.9160906 1.560356 1.674677 1.446034
## 6 -1.5800631 1.592706 1.682314 1.503098
## 7 -1.2440356 1.633607 1.706986 1.560228
## 8 -0.9080080 1.683058 1.747919 1.618196
## 9 -0.5719805 1.741058 1.802618 1.679499
## 10 -0.2359530 1.807609 1.867977 1.747241
## 11 0.1000745 1.882710 1.941895 1.823525
## 12 0.4361020 1.966361 2.023951 1.908771
## 13 0.7721295 2.058562 2.115627 2.001497
## 14 1.1081570 2.159313 2.220247 2.098379
## 15 1.4441845 2.268614 2.341390 2.195838
## 16 1.7802120 2.386465 2.480467 2.292464
## 17 2.1162395 2.512867 2.636892 2.388841
## 18 2.4522671 2.647818 2.809692 2.485944
## 19 2.7882946 2.791319 2.998172 2.584467
## 20 3.1243221 2.943370 3.201914 2.684827
## xvals yvals upper lower
## 1 -3.93443222 8.300555 9.249107 7.352003
## 2 -3.70448310 8.096067 8.927192 7.264942
## 3 -3.47453399 7.893493 8.615078 7.171908
## 4 -3.24458487 7.692831 8.312848 7.072815
## 5 -3.01463575 7.494083 8.020624 6.967542
## 6 -2.78468664 7.297248 7.738586 6.855910
## 7 -2.55473752 7.102326 7.467003 6.737650
## 8 -2.32478841 6.909318 7.206286 6.612349
## 9 -2.09483929 6.718222 6.957067 6.479377
## 10 -1.86489017 6.529040 6.720279 6.337801
## 11 -1.63494106 6.341771 6.497099 6.186443
## 12 -1.40499194 6.156416 6.288401 6.024430
## 13 -1.17504282 5.972973 6.093432 5.852515
## 14 -0.94509371 5.791444 5.908932 5.673956
## 15 -0.71514459 5.611828 5.730379 5.493277
## 16 -0.48519548 5.434125 5.553986 5.314265
## 17 -0.25524636 5.258336 5.377463 5.139209
## 18 -0.02529724 5.084460 5.199911 4.969008
## 19 0.20465187 4.912497 5.021713 4.803280
## 20 0.43460099 4.742447 4.844847 4.640046
## xvals yvals upper lower
## 1 -3.2602006 4.458123 5.014512 3.901734
## 2 -2.9241731 4.495843 4.948165 4.043521
## 3 -2.5881456 4.539230 4.900375 4.178085
## 4 -2.2521181 4.588283 4.871760 4.304806
## 5 -1.9160906 4.643002 4.863249 4.422755
## 6 -1.5800631 4.703388 4.876023 4.530753
## 7 -1.2440356 4.769440 4.910810 4.628071
## 8 -0.9080080 4.841159 4.966119 4.716200
## 9 -0.5719805 4.918545 5.037142 4.799947
## 10 -0.2359530 5.001597 5.117899 4.885294
## 11 0.1000745 5.090315 5.204338 4.976292
## 12 0.4361020 5.184700 5.295650 5.073750
## 13 0.7721295 5.284751 5.394691 5.174812
## 14 1.1081570 5.390469 5.507862 5.273076
## 15 1.4441845 5.501853 5.642060 5.361647
## 16 1.7802120 5.618904 5.800003 5.437805
## 17 2.1162395 5.741622 5.980563 5.502680
## 18 2.4522671 5.870005 6.181865 5.558146
## 19 2.7882946 6.004056 6.402568 5.605543
## 20 3.1243221 6.143773 6.641870 5.645675
## xvals yvals upper lower
## 1 -3.93443222 3.392942 4.150948 2.634936
## 2 -3.70448310 3.381828 4.045996 2.717660
## 3 -3.47453399 3.365829 3.942462 2.789196
## 4 -3.24458487 3.344946 3.840413 2.849479
## 5 -3.01463575 3.319178 3.739947 2.898409
## 6 -2.78468664 3.288526 3.641208 2.935844
## 7 -2.55473752 3.252990 3.544410 2.961570
## 8 -2.32478841 3.212569 3.449882 2.975256
## 9 -2.09483929 3.167264 3.358129 2.976398
## 10 -1.86489017 3.117074 3.269897 2.964251
## 11 -1.63494106 3.062000 3.186125 2.937874
## 12 -1.40499194 3.002041 3.107513 2.896569
## 13 -1.17504282 2.937198 3.033459 2.840938
## 14 -0.94509371 2.867471 2.961358 2.773584
## 15 -0.71514459 2.792859 2.887595 2.698123
## 16 -0.48519548 2.713363 2.809145 2.617580
## 17 -0.25524636 2.628982 2.724179 2.533785
## 18 -0.02529724 2.539717 2.631977 2.447458
## 19 0.20465187 2.445568 2.532844 2.358291
## 20 0.43460099 2.346534 2.428364 2.264703
## xvals yvals upper lower
## 1 -3.2602006 2.120431 2.565052 1.675810
## 2 -2.9241731 2.139394 2.500853 1.777935
## 3 -2.5881456 2.163663 2.452261 1.875065
## 4 -2.2521181 2.193238 2.419770 1.966706
## 5 -1.9160906 2.228119 2.404123 2.052115
## 6 -1.5800631 2.268306 2.406262 2.130350
## 7 -1.2440356 2.313799 2.426770 2.200828
## 8 -0.9080080 2.364598 2.464455 2.264741
## 9 -0.5719805 2.420703 2.515477 2.325929
## 10 -0.2359530 2.482114 2.575053 2.389175
## 11 0.1000745 2.548831 2.639949 2.457713
## 12 0.4361020 2.620854 2.709517 2.532192
## 13 0.7721295 2.698184 2.786038 2.610329
## 14 1.1081570 2.780819 2.874630 2.687008
## 15 1.4441845 2.868760 2.980802 2.756718
## 16 1.7802120 2.962007 3.106727 2.817287
## 17 2.1162395 3.060560 3.251503 2.869618
## 18 2.4522671 3.164420 3.413633 2.915207
## 19 2.7882946 3.273585 3.592044 2.955126
## 20 3.1243221 3.388056 3.786095 2.990017
All these analyses suggest that mothers who report inconsistent parenting at child age 3 (they do not follow trough with a threatened punishment) later report more mental health problems with their children. There are several reasons that we should not immediatly endorse a causal interpretation. This could in fact be revers causation, if children with problems at age 8 also had problems at age 5, and it was these problems at age five that caused inconsistent parenting. Some would certainly argue tha this is all genetic: the same genes that are reponsinle for mental health problems also cause inconsistent parenting (e.g. parents with mental health problems might be more prone to be inconsistent.). Genes are an example of confounders: common causes of exposure and outcomes. We do not have genetic information here, but we can easily think of other pontial common causes like parental education and age, parity. In addition, we might want to adjust for other co-variates like the childs gender.
The next lines of R code load these variables and adds them to our data.
data_directory = "data/"
Q1 = read_sav(paste0(data_directory,
"PDB439_Skjema1_v10.sav"))
Q1 = data.frame(Q1[,c("PREG_ID_439",
paste("AA",c(11,1123:1127,1300:1303),
sep = ""))])
names(Q1)[-1] = c("Q1_YEAR","CivilStatus","mEDUcomp","mEDUcurr",
"fEDUcomp","fEDUcurr","n_people_19p_Q1",
"n_children_12_18_Q1","n_children_6_11_Q1",
"n_children_0_5_Q1"))
Q1$CivilStatus = factor(Q1$CivilStatus,
levels = 1:6,
labels = c("married","separated","cohabitating",
"widow","single","Other"))
EDUlabels = c("basic","1-2 high school","vocational high","general high", "Bachelor","Master" )
Q1$mEDU = Q1$mEDUcomp
idx = is.na(Q1$mEDUcomp) & !is.na(Q1$mEDUcurr) & Q1$mEDUcurr != 1
Q1$mEDU[idx] = Q1$mEDUcurr[idx] - 1
Q1$mEDU = ordered(Q1$mEDU,labels = EDUlabels)
Q1$fEDU = Q1$fEDUcomp
idx = is.na(Q1$fEDUcomp) & !is.na(Q1$fEDUcurr) & Q1$fEDUcurr != 1
Q1$fEDU[idx] = Q1$fEDUcurr[idx] - 1
Q1$fEDU = ordered(Q1$fEDU,labels = EDUlabels)
rm(EDUlabels,idx)
data_directory = "data/"
MFR = read_sav(paste0(data_directory,
"PDB439_MFR_520_v10.sav"))
MFR = data.frame(MFR[,c("PREG_ID_439","BARN_NR","KJONN","MORS_ALDER",
"FAAR","LEVENDEFODTE_5","PARITET_5","FLERFODSEL",
"VEKT","SVLEN_DG","APGAR1","APGAR5","FMND")])
names(MFR)[-c(1,2)] = c("gender","mAge", "birth_year",
"life_births","parity","multiple_births",
"birthweight","pregnancy_duration",
"APGAR1","APGAR5","birthmonth"))
MFR$gender = factor(MFR$gender,levels = 1:2,labels = c("boy","girl"))
MFR[is.na(MFR$life_births), life_births := parity]
MFR[, birthmonth := as.numeric(birthmonth)]
my_data = merge(my_data,
MFR,
by = c("PREG_ID_439","BARN_NR"),
all.x = T,
all.y = F)
my_data = merge(my_data,
Q1,
by = c("PREG_ID_439"),
all.x = T,
all.y = F)
Now we are readdy to look at a few tables and plots:
describeBy(my_data[,c("positive","inconsistent")], group = my_data$mEDU)
##
## Descriptive statistics by group
## group: basic
## vars n mean sd median trimmed mad min max range skew
## positive 1 153 0.13 0.97 0.32 0.20 0.81 -2.76 1.58 4.35 -0.74
## inconsistent 2 153 0.31 1.11 0.16 0.29 1.10 -1.83 3.25 5.08 0.23
## kurtosis se
## positive -0.42 0.08
## inconsistent -0.50 0.09
## --------------------------------------------------------
## group: 1-2 high school
## vars n mean sd median trimmed mad min max range skew
## positive 1 384 0.23 1.00 0.53 0.33 0.80 -3.28 1.58 4.87 -0.99
## inconsistent 2 384 0.14 1.02 0.10 0.14 1.09 -2.46 3.82 6.28 0.09
## kurtosis se
## positive 0.30 0.05
## inconsistent -0.13 0.05
## --------------------------------------------------------
## group: vocational high
## vars n mean sd median trimmed mad min max range
## positive 1 1451 0.12 1.00 0.42 0.19 0.93 -3.93 1.58 5.52
## inconsistent 2 1451 0.05 1.02 0.09 0.04 1.01 -2.95 4.80 7.75
## skew kurtosis se
## positive -0.69 -0.34 0.03
## inconsistent 0.14 0.04 0.03
## --------------------------------------------------------
## group: general high
## vars n mean sd median trimmed mad min max range
## positive 1 1897 0.11 0.97 0.40 0.18 0.86 -3.63 1.58 5.21
## inconsistent 2 1897 0.00 1.04 0.02 -0.01 1.10 -3.26 3.25 6.51
## skew kurtosis se
## positive -0.66 -0.48 0.02
## inconsistent 0.07 -0.33 0.02
## --------------------------------------------------------
## group: Bachelor
## vars n mean sd median trimmed mad min max range
## positive 1 7639 -0.01 0.98 0.30 0.03 1.04 -3.57 1.58 5.15
## inconsistent 2 7639 -0.03 0.98 -0.01 -0.04 1.05 -3.21 3.25 6.46
## skew kurtosis se
## positive -0.46 -0.84 0.01
## inconsistent 0.04 -0.24 0.01
## --------------------------------------------------------
## group: Master
## vars n mean sd median trimmed mad min max range
## positive 1 5435 -0.07 1.00 0.23 -0.03 1.10 -3.67 1.58 5.26
## inconsistent 2 5435 0.01 0.99 0.09 0.02 0.98 -2.72 3.82 6.54
## skew kurtosis se
## positive -0.39 -0.93 0.01
## inconsistent -0.04 -0.18 0.01
This does not look very clear. We can also plot the raw data using the nice ggplot package. Lets start with how parents with different educational levels differ in inconsistent parenting.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
ggplot(my_data, aes(x = inconsistent, fill = mEDU)) +
geom_histogram(aes(y = ..density..),position="dodge2", bins = 10)
This is still not very clear. We can summarize the data more by using boxplots.
ggplot(my_data, aes(x = mEDU, y = inconsistent, fill = mEDU)) +
geom_boxplot() +
stat_summary(fun.y="mean", geom="point", shape=23, size=3, fill="white")
We can also plot means and confidence intervals. We use the dplyr package to calculate group means and quantiles, and then plot those:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
my_stats = my_data %>%
group_by(mEDU) %>%
summarise(m = mean(inconsistent),
sd = sd(inconsistent),
N = n()
)
my_stats$CIlower = my_stats$m - my_stats$sd/sqrt(my_stats$N)*2
my_stats$CIupper = my_stats$m + my_stats$sd/sqrt(my_stats$N)*2
ggplot(my_stats,aes(x = mEDU, y = m, fill = mEDU)) + geom_bar(stat = "identity") +
geom_pointrange(aes(x = mEDU, ymin = CIlower, ymax = CIupper)) +
stat_summary(fun.y="median", geom="point", shape=23, size=3, fill="white")
Alltogether, this tells us that there is not a lot happening between groups. This might seem counterintuitive given the last plot. But remeber that the data are scaled to have a standard deviation of 1. Only the difference between 1-2 hyewrs high school and mothers with a Master seems to be relatively large.
Because the data are not normally distrinbuted, it also seems that boxplots, which rely on mediand and quartiles, are more approproiate here. So lets look at the boxplots for positive parenting.
ggplot(my_data, aes(x = mEDU, y = inconsistent, fill = mEDU)) +
geom_boxplot() +
stat_summary(fun.y="mean", geom="point", shape=23, size=3, fill="white")
We see again that while there is some variation between groups, variation within groups is much more important.
In sum, it looks as if there are only subtle difference in (reported) parental style between groups with different maternal education.
Now lest look at the association with some othr potential confounders:
breaks = c(seq(15,40, by = 5),50)
my_data$mAgeg = cut(my_data$mAge, breaks = breaks,ordered_result = T)
ggplot(my_data, aes(x = mAgeg, y = inconsistent, fill = mAgeg)) +
geom_boxplot()
my_data$parity = as_factor(my_data$parity,ordered = T)
ggplot(my_data, aes(x = parity, y = inconsistent, fill = parity)) +
geom_boxplot()
ggplot(my_data, aes(x = gender, y = inconsistent, fill = gender)) +
geom_boxplot()
These associations also do not seem particularily strong.
Now we quickly look at variations of positive parenting between groups:
ggplot(my_data, aes(x = mEDU, y = positive, fill = mEDU)) +
geom_boxplot() +
stat_summary(fun.y="mean", geom="point", shape=23, size=3, fill="white")
breaks = c(seq(15,40, by = 5),50)
my_data$mAgeg = cut(my_data$mAge, breaks = breaks,ordered_result = T)
ggplot(my_data, aes(x = mAgeg, y = positive, fill = mAgeg)) +
geom_boxplot()
my_data$parity = as_factor(my_data$parity,ordered = T)
ggplot(my_data, aes(x = parity, y = positive, fill = parity)) +
geom_boxplot()
ggplot(my_data, aes(x = gender, y = positive, fill = gender)) +
geom_boxplot()
We see again that while there is some variation between groups (with exception gender), variation within groups is much larger.
To conclude this, we alo look at one of the outcomes
my_stats = my_data %>%
group_by(mAgeg) %>%
summarise(m = mean(ADHD),
md = median(ADHD),
sd = sd(ADHD),
N = n()
)
my_stats$CIlower = my_stats$m - my_stats$sd/sqrt(my_stats$N)*2
my_stats$CIupper = my_stats$m + my_stats$sd/sqrt(my_stats$N)*2
ggplot(my_stats,aes(x = mAgeg, y = m, fill = mAgeg)) + geom_bar(stat = "identity") +
geom_pointrange(aes(x = mAgeg, ymin = CIlower, ymax = CIupper)) +
geom_point(aes(x = mAgeg, y = md), col= "white", size = 2 )
my_data = my_data[complete.cases(my_data[,c("mEDU","mAge","parity","gender")]),]
outcomes = c("OD","CD","SMFQ","CCC","SPRAAK")
my_data$parity_n = as.numeric(my_data$parity)
my_data$mEDU_n = as.numeric(my_data$mEDU)
for (o in outcomes) {
reg_mod_str = paste(o,"~poly(positive,2) + poly(inconsistent,2) + poly(mEDU_n,2) + poly(mAge,2) + poly(parity_n,2) + gender")
lm_fit = lm(as.formula(reg_mod_str),
data = my_data)
plot_regression_results(lm_fit,o)
}
## xvals yvals upper lower
## 1 -3.93443222 4.624779 5.264473 3.985085
## 2 -3.70448310 4.635377 5.196956 4.073799
## 3 -3.47453399 4.637614 5.126465 4.148762
## 4 -3.24458487 4.631487 5.053090 4.209884
## 5 -3.01463575 4.616998 4.976959 4.257038
## 6 -2.78468664 4.594147 4.898254 4.290040
## 7 -2.55473752 4.562933 4.817237 4.308629
## 8 -2.32478841 4.523356 4.734282 4.312431
## 9 -2.09483929 4.475417 4.649899 4.300936
## 10 -1.86489017 4.419116 4.564699 4.273532
## 11 -1.63494106 4.354452 4.479190 4.229714
## 12 -1.40499194 4.281425 4.393285 4.169566
## 13 -1.17504282 4.200036 4.305786 4.094287
## 14 -0.94509371 4.110285 4.214481 4.006088
## 15 -0.71514459 4.012171 4.116938 3.907404
## 16 -0.48519548 3.905694 4.011233 3.800155
## 17 -0.25524636 3.790855 3.896202 3.685508
## 18 -0.02529724 3.667653 3.771406 3.563900
## 19 0.20465187 3.536089 3.637107 3.435071
## 20 0.43460099 3.396162 3.494345 3.297979
## xvals yvals upper lower
## 1 -3.2602006 2.352393 2.730301 1.974485
## 2 -2.9241731 2.473916 2.783892 2.163940
## 3 -2.5881456 2.598316 2.849431 2.347202
## 4 -2.2521181 2.725596 2.927471 2.523720
## 5 -1.9160906 2.855753 3.018702 2.692805
## 6 -1.5800631 2.988790 3.123691 2.853888
## 7 -1.2440356 3.124704 3.242180 3.007229
## 8 -0.9080080 3.263497 3.372259 3.154736
## 9 -0.5719805 3.405169 3.510612 3.299726
## 10 -0.2359530 3.549719 3.653952 3.445486
## 11 0.1000745 3.697147 3.800240 3.594054
## 12 0.4361020 3.847454 3.949101 3.745807
## 13 0.7721295 4.000639 4.101944 3.899335
## 14 1.1081570 4.156703 4.261910 4.051496
## 15 1.4441845 4.315645 4.432957 4.198333
## 16 1.7802120 4.477466 4.617889 4.337043
## 17 2.1162395 4.642165 4.817311 4.467019
## 18 2.4522671 4.809742 5.030506 4.588979
## 19 2.7882946 4.980198 5.256538 4.703859
## 20 3.1243221 5.153533 5.494685 4.812381
## xvals yvals upper lower
## 1 -3.93443222 1.7328833 2.0323856 1.4333810
## 2 -3.70448310 1.6992655 1.9621945 1.4363366
## 3 -3.47453399 1.6647457 1.8936239 1.4358674
## 4 -3.24458487 1.6293237 1.8267164 1.4319310
## 5 -3.01463575 1.5929996 1.7615316 1.4244676
## 6 -2.78468664 1.5557734 1.6981549 1.4133918
## 7 -2.55473752 1.5176451 1.6367091 1.3985810
## 8 -2.32478841 1.4786146 1.5773691 1.3798601
## 9 -2.09483929 1.4386821 1.5203736 1.3569906
## 10 -1.86489017 1.3978474 1.4660090 1.3296858
## 11 -1.63494106 1.3561106 1.4145125 1.2977088
## 12 -1.40499194 1.3134718 1.3658438 1.2610997
## 13 -1.17504282 1.2699308 1.3194424 1.2204191
## 14 -0.94509371 1.2254876 1.2742721 1.1767032
## 15 -0.71514459 1.1801424 1.2291940 1.1310909
## 16 -0.48519548 1.1338951 1.1833081 1.0844821
## 17 -0.25524636 1.0867456 1.1360687 1.0374226
## 18 -0.02529724 1.0386941 1.0872708 0.9901174
## 19 0.20465187 0.9897404 1.0370365 0.9424443
## 20 0.43460099 0.9398846 0.9858535 0.8939157
## xvals yvals upper lower
## 1 -3.2602006 0.5286981 0.7056332 0.3517630
## 2 -2.9241731 0.5910140 0.7361435 0.4458845
## 3 -2.5881456 0.6509563 0.7685272 0.5333853
## 4 -2.2521181 0.7085249 0.8030423 0.6140076
## 5 -1.9160906 0.7637199 0.8400116 0.6874282
## 6 -1.5800631 0.8165413 0.8797017 0.7533809
## 7 -1.2440356 0.8669890 0.9219906 0.8119874
## 8 -0.9080080 0.9150630 0.9659849 0.8641412
## 9 -0.5719805 0.9607635 1.0101313 0.9113956
## 10 -0.2359530 1.0040902 1.0528917 0.9552887
## 11 0.1000745 1.0450434 1.0933112 0.9967756
## 12 0.4361020 1.0836229 1.1312138 1.0360320
## 13 0.7721295 1.1198287 1.1672590 1.0723984
## 14 1.1081570 1.1536609 1.2029183 1.1044035
## 15 1.4441845 1.1851195 1.2400444 1.1301945
## 16 1.7802120 1.2142044 1.2799497 1.1484590
## 17 2.1162395 1.2409156 1.3229184 1.1589129
## 18 2.4522671 1.2652533 1.3686137 1.1618928
## 19 2.7882946 1.2872172 1.4165982 1.1578363
## 20 3.1243221 1.3068076 1.4665336 1.1470815
## xvals yvals upper lower
## 1 -3.93443222 3.086738 3.585788 2.587687
## 2 -3.70448310 2.997079 3.435189 2.558969
## 3 -3.47453399 2.909190 3.290563 2.527818
## 4 -3.24458487 2.823071 3.151980 2.494162
## 5 -3.01463575 2.738720 3.019540 2.457901
## 6 -2.78468664 2.656140 2.893385 2.418894
## 7 -2.55473752 2.575328 2.773721 2.376936
## 8 -2.32478841 2.496286 2.660838 2.331735
## 9 -2.09483929 2.419014 2.555133 2.282894
## 10 -1.86489017 2.343510 2.457086 2.229935
## 11 -1.63494106 2.269776 2.367090 2.172463
## 12 -1.40499194 2.197812 2.285078 2.110546
## 13 -1.17504282 2.127617 2.210117 2.045117
## 14 -0.94509371 2.059191 2.140479 1.977903
## 15 -0.71514459 1.992535 2.074268 1.910802
## 16 -0.48519548 1.927648 2.009983 1.845313
## 17 -0.25524636 1.864531 1.946716 1.782345
## 18 -0.02529724 1.803183 1.884124 1.722241
## 19 0.20465187 1.743604 1.822412 1.664796
## 20 0.43460099 1.685795 1.762391 1.609198
## xvals yvals upper lower
## 1 -3.2602006 1.416244 1.711065 1.121423
## 2 -2.9241731 1.421365 1.663190 1.179541
## 3 -2.5881456 1.434312 1.630216 1.238407
## 4 -2.2521181 1.455082 1.612573 1.297591
## 5 -1.9160906 1.483677 1.610799 1.356555
## 6 -1.5800631 1.520096 1.625338 1.414854
## 7 -1.2440356 1.564340 1.655987 1.472693
## 8 -0.9080080 1.616408 1.701257 1.531558
## 9 -0.5719805 1.676300 1.758560 1.594040
## 10 -0.2359530 1.744017 1.825333 1.662701
## 11 0.1000745 1.819558 1.899985 1.739131
## 12 0.4361020 1.902923 1.982222 1.823624
## 13 0.7721295 1.994113 2.073145 1.915081
## 14 1.1081570 2.093127 2.175203 2.011051
## 15 1.4441845 2.199965 2.291485 2.108446
## 16 1.7802120 2.314628 2.424178 2.205079
## 17 2.1162395 2.437115 2.573754 2.300477
## 18 2.4522671 2.567427 2.739653 2.395201
## 19 2.7882946 2.705563 2.921146 2.489979
## 20 3.1243221 2.851523 3.117669 2.585377
## xvals yvals upper lower
## 1 -3.93443222 8.508395 9.468799 7.547990
## 2 -3.70448310 8.336497 9.179623 7.493371
## 3 -3.47453399 8.164077 8.898014 7.430140
## 4 -3.24458487 7.991136 8.624109 7.358163
## 5 -3.01463575 7.817674 8.358101 7.277248
## 6 -2.78468664 7.643691 8.100262 7.187120
## 7 -2.55473752 7.469186 7.850986 7.087387
## 8 -2.32478841 7.294161 7.610834 6.977488
## 9 -2.09483929 7.118614 7.380571 6.856656
## 10 -1.86489017 6.942545 7.161117 6.723973
## 11 -1.63494106 6.765956 6.953231 6.578680
## 12 -1.40499194 6.588845 6.756784 6.420905
## 13 -1.17504282 6.411212 6.569980 6.252445
## 14 -0.94509371 6.233059 6.389495 6.076623
## 15 -0.71514459 6.054384 6.211676 5.897092
## 16 -0.48519548 5.875188 6.033640 5.716737
## 17 -0.25524636 5.695471 5.853634 5.537308
## 18 -0.02529724 5.515233 5.671002 5.359463
## 19 0.20465187 5.334473 5.486136 5.182810
## 20 0.43460099 5.153192 5.300599 5.005785
## xvals yvals upper lower
## 1 -3.2602006 4.767881 5.335253 4.200508
## 2 -2.9241731 4.836051 5.301433 4.370668
## 3 -2.5881456 4.905763 5.282774 4.528752
## 4 -2.2521181 4.977016 5.280102 4.673931
## 5 -1.9160906 5.049812 5.294454 4.805170
## 6 -1.5800631 5.124149 5.326683 4.921614
## 7 -1.2440356 5.200028 5.376400 5.023656
## 8 -0.9080080 5.277448 5.440738 5.114159
## 9 -0.5719805 5.356410 5.514717 5.198104
## 10 -0.2359530 5.436914 5.593404 5.280424
## 11 0.1000745 5.518960 5.673738 5.364181
## 12 0.4361020 5.602547 5.755155 5.449938
## 13 0.7721295 5.687676 5.839769 5.535582
## 14 1.1081570 5.774346 5.932298 5.616394
## 15 1.4441845 5.862558 6.038684 5.686432
## 16 1.7802120 5.952312 6.163136 5.741489
## 17 2.1162395 6.043608 6.306564 5.780652
## 18 2.4522671 6.136445 6.467888 5.805003
## 19 2.7882946 6.230824 6.645706 5.815942
## 20 3.1243221 6.326745 6.838934 5.814556
## xvals yvals upper lower
## 1 -3.93443222 3.608636 4.381533 2.835738
## 2 -3.70448310 3.611254 4.289770 2.932738
## 3 -3.47453399 3.607841 4.198486 3.017196
## 4 -3.24458487 3.598397 4.107790 3.089004
## 5 -3.01463575 3.582922 4.017837 3.148007
## 6 -2.78468664 3.561416 3.928846 3.193985
## 7 -2.55473752 3.533878 3.841136 3.226621
## 8 -2.32478841 3.500310 3.755156 3.245463
## 9 -2.09483929 3.460710 3.671524 3.249896
## 10 -1.86489017 3.415079 3.590978 3.239181
## 11 -1.63494106 3.363417 3.514130 3.212705
## 12 -1.40499194 3.305724 3.440876 3.170573
## 13 -1.17504282 3.242000 3.369770 3.114230
## 14 -0.94509371 3.172245 3.298139 3.046352
## 15 -0.71514459 3.096459 3.223041 2.969876
## 16 -0.48519548 3.014641 3.142157 2.887126
## 17 -0.25524636 2.926793 3.054076 2.799509
## 18 -0.02529724 2.832913 2.958270 2.707555
## 19 0.20465187 2.733002 2.855055 2.610949
## 20 0.43460099 2.627060 2.745688 2.508432
## xvals yvals upper lower
## 1 -3.2602006 2.392588 2.849188 1.935988
## 2 -2.9241731 2.421084 2.795606 2.046561
## 3 -2.5881456 2.453195 2.756599 2.149790
## 4 -2.2521181 2.488922 2.732834 2.245009
## 5 -1.9160906 2.528264 2.725143 2.331385
## 6 -1.5800631 2.571221 2.734213 2.408229
## 7 -1.2440356 2.617794 2.759732 2.475856
## 8 -0.9080080 2.667982 2.799392 2.536573
## 9 -0.5719805 2.721786 2.849185 2.594387
## 10 -0.2359530 2.779205 2.905143 2.653268
## 11 0.1000745 2.840240 2.964800 2.715679
## 12 0.4361020 2.904890 3.027703 2.782076
## 13 0.7721295 2.973155 3.095554 2.850756
## 14 1.1081570 3.045036 3.172150 2.917922
## 15 1.4441845 3.120532 3.262272 2.978792
## 16 1.7802120 3.199644 3.369307 3.029981
## 17 2.1162395 3.282371 3.493988 3.070754
## 18 2.4522671 3.368713 3.635446 3.101980
## 19 2.7882946 3.458671 3.792552 3.124789
## 20 3.1243221 3.552244 3.964434 3.140054